34 research outputs found
Temporal workload-aware replicated partitioning for social networks
Most frequent and expensive queries in social networks involve multi-user operations such as requesting the latest tweets or news-feeds of friends. The performance of such queries are heavily dependent on the data partitioning and replication methodologies adopted by the underlying systems. Existing solutions for data distribution in these systems involve hashor graph-based approaches that ignore the multi-way relations among data. In this work, we propose a novel data partitioning and selective replication method that utilizes the temporal information in prior workloads to predict future query patterns. Our method utilizes the social network structure and the temporality of the interactions among its users to construct a hypergraph that correctly models multi-user operations. It then performs simultaneous partitioning and replication of this hypergraph to reduce the query span while respecting load balance and I/O load constraints under replication. To test our model, we enhance the Cassandra NoSQL system to support selective replication and we implement a social network application (a Twitter clone) utilizing our enhanced Cassandra. We conduct experiments on a cloud computing environment (Amazon EC2) to test the developed systems. Comparison of the proposed method with hash- and enhanced graph-based schemes indicate that it significantly improves latency and throughput
Distributed Many-to-Many Protein Sequence Alignment using Sparse Matrices
Identifying similar protein sequences is a core step in many computational
biology pipelines such as detection of homologous protein sequences, generation
of similarity protein graphs for downstream analysis, functional annotation and
gene location. Performance and scalability of protein similarity searches have
proven to be a bottleneck in many bioinformatics pipelines due to increases in
cheap and abundant sequencing data. This work presents a new distributed-memory
software, PASTIS. PASTIS relies on sparse matrix computations for efficient
identification of possibly similar proteins. We use distributed sparse matrices
for scalability and show that the sparse matrix infrastructure is a great fit
for protein similarity searches when coupled with a fully-distributed
dictionary of sequences that allows remote sequence requests to be fulfilled.
Our algorithm incorporates the unique bias in amino acid sequence substitution
in searches without altering the basic sparse matrix model, and in turn,
achieves ideal scaling up to millions of protein sequences.Comment: To appear in International Conference for High Performance Computing,
Networking, Storage, and Analysis (SC'20
Optimizing High Performance Markov Clustering for Pre-Exascale Architectures
HipMCL is a high-performance distributed memory implementation of the popular
Markov Cluster Algorithm (MCL) and can cluster large-scale networks within
hours using a few thousand CPU-equipped nodes. It relies on sparse matrix
computations and heavily makes use of the sparse matrix-sparse matrix
multiplication kernel (SpGEMM). The existing parallel algorithms in HipMCL are
not scalable to Exascale architectures, both due to their communication costs
dominating the runtime at large concurrencies and also due to their inability
to take advantage of accelerators that are increasingly popular.
In this work, we systematically remove scalability and performance
bottlenecks of HipMCL. We enable GPUs by performing the expensive expansion
phase of the MCL algorithm on GPU. We propose a CPU-GPU joint distributed
SpGEMM algorithm called pipelined Sparse SUMMA and integrate a probabilistic
memory requirement estimator that is fast and accurate. We develop a new
merging algorithm for the incremental processing of partial results produced by
the GPUs, which improves the overlap efficiency and the peak memory usage. We
also integrate a recent and faster algorithm for performing SpGEMM on CPUs. We
validate our new algorithms and optimizations with extensive evaluations. With
the enabling of the GPUs and integration of new algorithms, HipMCL is up to
12.4x faster, being able to cluster a network with 70 million proteins and 68
billion connections just under 15 minutes using 1024 nodes of ORNL's Summit
supercomputer
The Parallelism Motifs of Genomic Data Analysis
Genomic data sets are growing dramatically as the cost of sequencing
continues to decline and small sequencing devices become available. Enormous
community databases store and share this data with the research community, but
some of these genomic data analysis problems require large scale computational
platforms to meet both the memory and computational requirements. These
applications differ from scientific simulations that dominate the workload on
high end parallel systems today and place different requirements on programming
support, software libraries, and parallel architectural design. For example,
they involve irregular communication patterns such as asynchronous updates to
shared data structures. We consider several problems in high performance
genomics analysis, including alignment, profiling, clustering, and assembly for
both single genomes and metagenomes. We identify some of the common
computational patterns or motifs that help inform parallelization strategies
and compare our motifs to some of the established lists, arguing that at least
two key patterns, sorting and hashing, are missing
Evaluating the Potential of Disaggregated Memory Systems for HPC applications
Disaggregated memory is a promising approach that addresses the limitations
of traditional memory architectures by enabling memory to be decoupled from
compute nodes and shared across a data center. Cloud platforms have deployed
such systems to improve overall system memory utilization, but performance can
vary across workloads. High-performance computing (HPC) is crucial in
scientific and engineering applications, where HPC machines also face the issue
of underutilized memory. As a result, improving system memory utilization while
understanding workload performance is essential for HPC operators. Therefore,
learning the potential of a disaggregated memory system before deployment is a
critical step. This paper proposes a methodology for exploring the design space
of a disaggregated memory system. It incorporates key metrics that affect
performance on disaggregated memory systems: memory capacity, local and remote
memory access ratio, injection bandwidth, and bisection bandwidth, providing an
intuitive approach to guide machine configurations based on technology trends
and workload characteristics. We apply our methodology to analyze thirteen
diverse workloads, including AI training, data analysis, genomics, protein,
fusion, atomic nuclei, and traditional HPC bookends. Our methodology
demonstrates the ability to comprehend the potential and pitfalls of a
disaggregated memory system and provides motivation for machine configurations.
Our results show that eleven of our thirteen applications can leverage
injection bandwidth disaggregated memory without affecting performance, while
one pays a rack bisection bandwidth penalty and two pay the system-wide
bisection bandwidth penalty. In addition, we also show that intra-rack memory
disaggregation would meet the application's memory requirement and provide
enough remote memory bandwidth.Comment: The submission builds on the following conference paper: N. Ding, S.
Williams, H.A. Nam, et al. Methodology for Evaluating the Potential of
Disaggregated Memory Systems,2nd International Workshop on RESource
DISaggregation in High-Performance Computing (RESDIS), November 18, 2022. It
is now submitted to the CCPE journal for revie
Unraveling the functional dark matter through global metagenomics
30 pages, 4 figures, 1 table, supplementary information https://doi.org/10.1038/s41586-023-06583-7.-- Data availability: All of the analysed datasets along with their corresponding sequences are available from the IMG system (http://img.jgi.doe.gov/). A list of the datasets used in this study is provided in Supplementary Data 8. All data from the protein clusters, including sequences, multiple alignments, HMM profiles, 3D structure models, and taxonomic and ecosystem annotation, are available through NMPFamsDB, publicly accessible at www.nmpfamsdb.org. The 3D models are also available at ModelArchive under accession code ma-nmpfamsdb.-- Code availability: Sequence analysis was performed using Tantan (https://gitlab.com/mcfrith/tantan), BLAST (https://blast.ncbi.nlm.nih.gov/Blast.cgi), LAST (https://gitlab.com/mcfrith/last), HMMER (http://hmmer.org/) and HH-suite3 (https://github.com/soedinglab/hh-suite). Clustering was performed using HipMCL (https://bitbucket.org/azadcse/hipmcl/src/master/). Additional taxonomic annotation was performed using Whokaryote (https://github.com/LottePronk/whokaryote), EukRep (https://github.com/patrickwest/EukRep), DeepVirFinder (https://github.com/jessieren/DeepVirFinder) and MMseqs2 (https://github.com/soedinglab/MMseqs2). 3D modelling was performed using AlphaFold2 (https://github.com/deepmind/alphafold) and TrRosetta2 (https://github.com/RosettaCommons/trRosetta2). Structural alignments were performed using TMalign (https://zhanggroup.org/TM-align/) and MMalign (https://zhanggroup.org/MM-align/). All custom scripts used for the generation and analysis of the data are available at Zenodo (https://doi.org/10.5281/zenodo.8097349)Metagenomes encode an enormous diversity of proteins, reflecting a multiplicity of functions and activities1,2. Exploration of this vast sequence space has been limited to a comparative analysis against reference microbial genomes and protein families derived from those genomes. Here, to examine the scale of yet untapped functional diversity beyond what is currently possible through the lens of reference genomes, we develop a computational approach to generate reference-free protein families from the sequence space in metagenomes. We analyse 26,931 metagenomes and identify 1.17 billion protein sequences longer than 35 amino acids with no similarity to any sequences from 102,491 reference genomes or the Pfam database3. Using massively parallel graph-based clustering, we group these proteins into 106,198 novel sequence clusters with more than 100 members, doubling the number of protein families obtained from the reference genomes clustered using the same approach. We annotate these families on the basis of their taxonomic, habitat, geographical and gene neighbourhood distributions and, where sufficient sequence diversity is available, predict protein three-dimensional models, revealing novel structures. Overall, our results uncover an enormously diverse functional space, highlighting the importance of further exploring the microbial functional dark matterWith the institutional support of the ‘Severo Ochoa Centre of Excellence’ accreditation (CEX2019-000928-S)Peer reviewe